Systems and Methods for Generating Compressed Light Field Representation Data using Captured Light Fields, Array Geometry, and Parallax Information

ABSTRACT

Systems and methods for the generating compressed light field representation data using captured light fields in accordance embodiments of the invention are disclosed. In one embodiment, an array camera includes a processor and a memory connected configured to store an image processing application, wherein the image processing application configures the processor to obtain image data, wherein the image data includes a set of images including a reference image and at least one alternate view image, generate a depth map based on the image data, determine at least one prediction image based on the reference image and the depth map, compute prediction error data based on the at least one prediction image and the at least one alternate view image, and generate compressed light field representation data based on the reference image, the prediction error data, and the depth map.

CROSS-REFERENCE TO RELATED APPLICATIONS

The current application is a continuation of U.S. patent applicationSer. No. 14/186,871, entitled “Systems and Methods for GeneratingCompressed Light Field Representation Data using Captured Light Fields,Array Geometry, and Parallax Information” to Venkataraman et al., filedon Feb. 21, 2014, which claims priority to U.S. Provisional PatentApplication Ser. No. 61/767,520, filed Feb. 21, 2013, and to U.S.Provisional Patent Application Ser. No. 61/786,976, filed Mar. 15, 2013,the disclosures of which are hereby incorporated by reference in theirentirety.

FIELD OF THE INVENTION

The present invention relates to systems and methods for capturing lightfields and more specifically to the efficient representation of capturedlight fields using compressed light field representation data.

BACKGROUND

Imaging devices, such as cameras, can be used to capture images ofportions of the electromagnetic spectrum, such as the visible lightspectrum, incident upon an image sensor. For ease of discussion, theterm light is generically used to cover radiation across the entireelectromagnetic spectrum. In a typical imaging device, light entersthrough an opening (aperture) at one end of the imaging device and isdirected to an image sensor by one or more optical elements such aslenses. The image sensor includes pixels or sensor elements thatgenerate signals upon receiving light via the optical element. Commonlyused image sensors include charge-coupled device (CCDs) sensors andcomplementary metal-oxide semiconductor (CMOS) sensors.

Image sensors are devices capable of converting an image into a digitalsignal. Image sensors utilized in digital cameras are typically made upof an array of pixels. Each pixel in an image sensor is capable ofcapturing light and converting the captured light into electricalsignals. In order to separate the colors of light and capture a colorimage, a Bayer filter is often placed over the image sensor, filteringthe incoming light into its red, blue, and green (RGB) components thatare then captured by the image sensor. The RGB signal captured by theimage sensor using a Bayer filter can then be processed and a colorimage can be created.

SUMMARY OF THE INVENTION

Systems and methods for the generating compressed light fieldrepresentation data using captured light fields in accordanceembodiments of the invention are disclosed. In one embodiment, an arraycamera includes a processor and a memory connected to the processor andconfigured to store an image processing application, wherein the imageprocessing application configures the processor to obtain image data,wherein the image data includes a set of images including a referenceimage and at least one alternate view image and each image in the set ofimages includes a set of pixels, generate a depth map based on the imagedata, where the depth map describes the distance from the viewpoint ofthe reference image with respect to objects imaged by pixels within thereference image, determine at least one prediction image based on thereference image and the depth map, where the prediction imagescorrespond to at least one alternate view image, compute predictionerror data based on the at least one prediction image and the at leastone alternate view image, where a portion of prediction error datadescribes the difference in photometric information between a pixel in aprediction image and a pixel in at least one alternate view imagecorresponding to the prediction image, and generate compressed lightfield representation data based on the reference image, the predictionerror data, and the depth map.

In an additional embodiment of the invention, the array camera furtherincludes an array camera module including an imager array havingmultiple focal planes and an optics array configured to form imagesthrough separate apertures on each of the focal planes, wherein thearray camera module is configured to communicate with the processor andwherein the obtained image data includes images captured by the imagerarray.

In another embodiment of the invention, the reference image correspondsto an image captured using one of the focal planes within the imagearray.

In yet another additional embodiment of the invention, the at least onealternate view image corresponds to the image data captured using thefocal planes within the image array separate from the focal planesassociated with the reference image.

In still another additional embodiment of the invention, the referenceimage corresponds to a virtual image formed based on the images in thearray.

In another embodiment of the invention, the depth map describes thegeometrical linkage between the pixels in the reference image and thepixels in the other images in the image array.

In yet still another additional embodiment of the invention, the imageprocessing application configures the processor to perform a parallaxdetection process to generate the depth map, where the parallaxdetection process identifies variations in the position of objectswithin the image data along epipolar lines between the reference imageand the at least one alternate view image.

In yet another embodiment of the invention, the image processingapplication further configures the processor to compress the generatedcompressed light field representation data.

In still another embodiment of the invention, the generated compressedlight field representation data is compressed using JPEG-DX.

In yet still another embodiment of the invention, the image processingapplication configures the processor to determine prediction error databy identifying at least one pixel in the at least one alternative viewimage corresponding to a reference pixel in the reference image,determining fractional pixel locations within the identified at leastone pixel, where a fractional pixel location maps to a plurality ofpixels in at least one alternative view image, and mapping fractionalpixel locations to a specific pixel location within the alternate viewimage having a determined fractional pixel location.

In yet another additional embodiment of the invention, the mappingfractional pixel locations is determined as the pixel being nearestneighbor within the alternative view image.

In still another additional embodiment of the invention, the imageprocessing application configures the processor to map the fractionalpixel locations based on the depth map, where the pixel in the alternateview image is likely to be similar based on its proximity to thecorresponding pixel location determined using the depth map of thereference image.

In yet still another additional embodiment of the invention, the imageprocessing application further configures the processor to identifyareas of low confidence within the computed prediction images based onthe at least one alternate view image, the reference image, and thedepth map and an area of low confidence indicate areas where theinformation stored in a determined prediction image indicate areas inthe reference viewpoint where the pixels in the determined predictionimage may not photometrically correspond to the corresponding pixels inthe alternate view image.

In another embodiment of the invention, the depth map further comprisesa confidence map describing areas of low confidence within the depthmap.

In yet another embodiment of the invention, the image processingapplication further configures the processor to disregard identifiedareas of low confidence.

In still another embodiment of the invention, the image processingapplication further configures the processor to identify at least oneadditional reference image within the image data, where the at least oneadditional reference image is separate from the reference image,determine at least one supplemental prediction image based on thereference image, the at least one additional reference image, and thedepth map, and compute the supplemental prediction error data based onthe at least one alternate additional reference image and the at leastone supplemental prediction image, and the generated compressed lightfield representation data further includes the supplemental predictionerror data.

In yet still another embodiment of the invention, the generatedcompressed light field representation data further includes the at leastone additional reference image.

In yet another additional embodiment of the invention, the imageprocessing application configures the processor to identify the at leastone additional reference image by generating an initial additionalreference image based on the reference image and the depth map, wherethe initial additional reference image includes pixels projected fromthe viewpoint of the reference image based on the depth map and formingthe additional reference image based on the initial additional referenceimage and the prediction error data, where the additional referenceimage comprises pixels based on interpolations of pixels propagated fromthe reference image and the prediction error data.

In another embodiment of the invention, the prediction error data isdecoded based on the reference image prior to the formation of theadditional reference image.

Still another embodiment of the invention includes a method forgenerating compressed light field representation data includingobtaining image data using an array camera, where the image dataincludes a set of images including a reference image and at least onealternate view image and the images in the set of images include a setof pixels, generating a depth map based on the image data using thearray camera, where the depth map describes the distance from theviewpoint of the reference image with respect to objects imaged bypixels within the reference image based on the alternate view images,determining a set of prediction images based on the reference image andthe depth map using the array camera, where a prediction image in theset of prediction images is a representation of a correspondingalternate view image in the at least one alternate view image, computingprediction error data by calculating the difference between a predictionimage in the set of prediction images and the corresponding alternateview image that describes the difference in photometric informationbetween a pixel in the reference image and a pixel in an alternate viewimage using the array camera, and generating compressed light fieldrepresentation data based on the reference image, the prediction errordata, and the depth map using the array camera.

In yet another additional embodiment of the invention, the referenceimage is a virtual image interpolated from a virtual viewpoint withinthe image data.

In still another additional embodiment of the invention, determining theset of predicted images further includes identifying at least one pixelin the at least one alternative view image corresponding to a referencepixel in the reference image using the array camera, determiningfractional pixel locations within the identified at least one pixelusing the array camera, where a fractional pixel location maps to aplurality of pixels in at least one alternative view image, and mappingfractional pixel locations to a specific pixel location within thealternate view image having a determined fractional pixel location usingthe array camera.

In yet still another embodiment of the invention, the method furtherincludes identifying areas of low confidence within the computedprediction images based on the at least one alternate view image, thereference image, and the depth map using the array camera, where an areaof low confidence indicate areas where the information stored in adetermined prediction image indicate areas in the reference viewpointwhere the pixels in the determined prediction image may notphotometrically correspond to the corresponding pixels in the alternateview image.

In yet another additional embodiment of the invention, the methodfurther includes identifying at least one additional reference imagewithin the image data using the array camera, where the at least oneadditional reference image is separate from the reference image,determining at least one supplemental prediction image based on thereference image, the at least one additional reference image, and thedepth map using the array camera, and computing the supplementalprediction error data based on the at least one alternate additionalreference image and the at least one supplemental prediction image usingthe array camera, where the generated compressed light fieldrepresentation data further includes the supplemental prediction errordata.

In still another additional embodiment of the invention, identifying theat least one additional reference image includes generating an initialadditional reference image based on the reference image and the depthmap using the array camera, where the initial additional reference imageincludes pixels projected from the viewpoint of the reference imagebased on the depth map and forming the additional reference image basedon the initial additional reference image and the prediction error datausing the array camera, where the additional reference image comprisespixels based on interpolations of pixels propagated from the referenceimage and the prediction error data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system diagram of an array camera including a 5×5 imagerarray with storage hardware connected with a processor in accordancewith an embodiment of the invention.

FIG. 2 is a flow chart conceptually illustrating a process for capturingand processing light fields in accordance with an embodiment of theinvention.

FIG. 3 is a flow chart conceptually illustrating a process forgenerating compressed light field representation data in accordance withan embodiment of the invention.

FIG. 4A is a conceptual illustration of a reference image in an 4×4array of images and corresponding epipolar lines in accordance with anembodiment of the invention.

FIG. 4B is a conceptual illustration of multiple reference images in a4×4 array of images in accordance with an embodiment of the invention.

FIG. 5 is a conceptual illustration of a prediction error histogram inaccordance with an embodiment of the invention.

FIG. 6 is a flow chart conceptually illustrating a process for decodingcompressed light field representation data in accordance with anembodiment of the invention.

DETAILED DESCRIPTION

Turning now to the drawings, systems and methods for generatingcompressed light field representation data using captured light fieldsin accordance with embodiments of the invention are illustrated. Arraycameras, such as those described in U.S. patent application Ser. No.12/935,504, entitled “Capturing and Processing of Images usingMonolithic Camera Array with Heterogeneous Imagers” to Venkataraman etal., can be utilized to capture light fields and store the capturedlight fields. Captured light fields contain image data from an array ofimages of a scene captured from multiple points of view so that eachimage samples the light field of the same region within the scene (asopposed to a mosaic of images that sample partially overlapping regionsof a scene). It should be noted that any configuration of images,including two-dimensional arrays, non-rectangular arrays, sparse arrays,and subsets of arrays of images could be utilized as appropriate to therequirements of specific embodiments of the invention. In a variety ofembodiments, image data for a specific image that forms part of acaptured light field describes a two-dimensional array of pixels.Storing all of the image data for the images in a captured light fieldcan consume a disproportionate amount of storage space, limiting thenumber of light field images that can be stored within a fixed capacitystorage device and increasing the amount of data transfer involved intransmitting a captured light field. Array cameras in accordance withmany embodiments of the invention are configured to process capturedlight fields and generate data describing correlations between theimages in the captured light field. Based on the image correlation data,some or all of the image data in the captured light field can bediscarded, affording more efficient storage of the captured light fieldsas compressed light field representation data. Additionally, thisprocess can be decoupled from the capturing of light fields to enablethe efficient use of the hardware resources present in the array camera.

In many embodiments, each image in a captured light field is from adifferent viewpoint. Due to the different viewpoint of each of theimages, parallax results in variations in the position of objects withinthe images of the scene. The disparity between corresponding pixels inimages in a captured light field can be utilized to determine thedistance to an object imaged by the corresponding pixels. Conversely,distance can be used to estimate the location of a corresponding pixelin another image. Processes that can be utilized to detect parallax andgenerate depth maps in accordance with embodiments of the invention aredisclosed in U.S. patent application Ser. No. 13/972,881 entitled“Systems and Methods for Parallax Detection and Correction in ImagesCaptured Using Array Cameras that Contain Occlusions using Subsets ofImages to Perform Depth Estimation” to Venkataraman et al. In manyembodiments, a depth map is metadata describing the distance from theviewpoint from which an image is captured (or, in the case ofsuper-resolution processing, synthesized) with respect to objects imagedby pixels within the image. Additionally, the depth map can alsodescribe the geometrical linkage between pixels in the reference imageand pixels in all other images in the array.

Array cameras in accordance with several embodiments of the inventionare configured to process the images in a captured light field using areference image selected from the captured array of images. In a varietyof embodiments, the reference image is a synthetic image generated fromthe captured images, such as a synthetic viewpoint generated from afocal plane (e.g. a camera) that does not physically exist in the imagerarray. The remaining images can be considered to be images of alternateviews of the scene relative to the viewpoint of the reference image.Using the reference image, array cameras in accordance with embodimentsof the invention can generate a depth map using processes similar tothose described above in U.S. patent application Ser. No. 13/972,881 andthe depth map can be used to generate a set of prediction imagesdescribing the pixel positions within one or more of the alternate viewimages that correspond to specific pixels within the reference image.The relative locations of pixels in the alternate view images can bepredicted along epipolar lines projected based on the configuration ofthe cameras (e.g. the calibration of the physical properties of theimager array in the array camera and their relationship to the referenceviewpoint of the array camera) that captured the images. The predictedlocation of the pixels along the epipolar lines is a function of thedistance from the reference viewpoint to the object imaged by thecorresponding pixel in the reference image. In a number of embodiments,the predicted location is additionally a function of any calibrationparameters intrinsic to or extrinsic to the physical imager array. Theprediction images exploit the correlation between the images in thecaptured light field by describing the differences between the value ofa pixel in the reference image and pixels adjacent to correspondingdisparity-shifted pixel locations in the other alternate view images inthe captured light field. The disparity-shifted pixel positions areoften determined with fractional pixel precision (e.g. an integerposition in the reference image is mapped to a fractional position inthe alternate view image) based on a depth map of the reference image inthe alternate view images. Significant compression of the image dataforming the images of a captured light field can be achieved byselecting one reference image, generating prediction images with respectto the reference image using the depth map information relating thereference and alternate view images, generating prediction error datadescribing the differences between the predicted images and thealternate view images, and discarding the alternate view images. In avariety of embodiments, multiple reference images are utilized togenerate prediction error data that describes the photometricdifferences between pixels in alternate view images adjacent tocorresponding disparity-shifted pixel locations and pixels in one ormore of the reference images.

It should also be noted that that while, in a variety of embodiments,the reference image corresponds to an image in the captured array ofimages, virtual (e.g. synthetic) images corresponding to a virtualviewpoint within the captured light field can also be utilized as thereference image in accordance with embodiments of the invention. Forexample, a virtual red image, a virtual green image, and/or a virtualblue image can be used to form a reference image for each respectivecolor channel and used as a starting point for forming predicted imagesfor the alternate view images of each respective color channel. In manyembodiments, a color channel includes a set of images within the imagearray corresponding to a particular color, potentially as captured bythe focal planes within the imager array. However, in accordance withembodiments of the invention, the reference image for a particular colorchannel can be taken from a different color channel; for example, aninfrared image can be used as the reference image for the green channelwithin the captured light field.

The reference image(s) and the set of prediction error data stored by anarray camera can be referred to as compressed light field representationdata. The compressed light field representation data can also includethe depth map utilized to generate the prediction error data and/or anyother metadata related to the creation of the compressed light fieldrepresentation data and/or the captured light field. The predictionerror data can be compressed using any compression technique, such asdiscrete cosine transform (DCT) techniques, as appropriate to therequirements of specific embodiments of the invention. The compressedlight field representation data can be compressed and stored in avariety of formats. One such file format is the JPEG-DX extension toISO/IEC 10918-1 described in U.S. patent application Ser. No.13/631,731, titled “Systems and Methods for Encoding Light Field ImageFiles” to Venkataraman et al. As can readily be appreciated, theprediction error data can be stored in a similar manner to a depth mapas compressed or uncompressed layers and/or metadata within an imagefile. In a variety of embodiments, array cameras are configured tocapture light fields separate from the generation of the compressedlight field representation data. For example, the compressed light fieldrepresentation data can be generated when the array camera is no longercapturing light fields or in the background as the array camera capturesadditional light fields. Any variety of decoupled processing techniquescan be utilized in accordance with the requirements of embodiments ofthe invention. Many array cameras in accordance with embodiments of theinvention are capable of performing a variety of processes that utilizethe information contained in the captured light field using thecompressed light field representation data.

In many instances, a captured light field contains image data from anarray of images of a scene that sample an object space within the scenein such a way as to provide sampling diversity that can be utilized tosynthesize higher resolution images of the object space usingsuper-resolution processes. Systems and methods for performingsuper-resolution processing on image data captured by an array camera inaccordance with embodiments of the invention are disclosed in U.S.patent application Ser. No. 12/967,807 entitled “System and Methods forSynthesizing High Resolution Images Using Super-Resolution Processes” toLelescu et al. Synthesized high resolution images are representations ofthe scene captured in the captured light field. In many instances, theprocess of synthesizing a high resolution image may result in a singleimage, a stereoscopic pair of images that can be used to display threedimensional (3D) information via an appropriate 3D display, and/or avariety of images from different viewpoints. The process of synthesizinghigh resolution images from lower resolution image data captured by anarray camera module in an array camera typically involves performingparallax detection and correction to reduce the effects of disparitybetween the images captured by each of the cameras in the array cameramodule. By using the reference image(s), the set of prediction errordata, and/or the depth map contained in compressed light fieldrepresentation data, high resolution images can be synthesizedseparately from the parallax detection and correction process, therebyalleviating the need to store and process the captured light field untilthe super-resolution process can be performed. Additionally, theparallax detection process can be optimized to improve speed orefficiency of compression. Once the compressed data is decoded, aparallax process can be re-run at a different (i.e. higher) precisionusing the reconstructed images. In this way, an initial super-resolutionprocess can be performed in an efficient manner (such as on an arraycamera, where the processing power of the device limits the ability toperform a high precision parallax process in real-time) and, at a latertime, a higher resolution parallax process can be performed to generateany of a variety of data, including a second set of compressed lightfield representation data and/or other captured light field image data,or perform any processing that relies on the captured light field. Latertimes include, but are not limited to, times when the array camera isnot capturing light fields and/or when the compressed light fieldrepresentation data has been transmitted to a separate image processingdevice with more advanced processing capabilities.

The disclosures of each of U.S. patent application Ser. Nos. 12/935,504,12/967,807, 13/631,731, and 13/972,881 are hereby incorporated byreference in their entirety. Although the systems and methods describedare with respect to array cameras configured to both capture and processcaptured light fields, devices that are configured to obtain capturedlight fields captured using a different device and process the receiveddata can be utilized in accordance with the requirements of a variety ofembodiments of the invention. Additionally, any of the various systemsand processes described herein can be performed in sequence, inalternative sequences, and/or in parallel (e.g. on different computingdevices) in order to achieve similar results in a manner that is moreappropriate to the requirements of a specific application of theinvention. Systems and methods for capturing light fields and generatingcompressed light field representation data using the captured lightfields in accordance with embodiments of the invention are describedbelow.

Array Camera Architectures

As described above, array cameras are capable of capturing andprocessing light fields and can be configured to generate compressedlight field representation data using captured light fields inaccordance with many embodiments of the invention. An array cameraincluding an imager array in accordance with an embodiment of theinvention is illustrated in FIG. 1. The array camera 100 includes anarray camera module including an imager array 102 having multiple focalplanes 104 and an optics array configured to form images throughseparate apertures on each of the focal planes. The imager array 102 isconfigured to communicate with a processor 108. In accordance with manyembodiments of the invention, the processor 108 is configured to readout image data captured by the imager array 102 and generate compressedlight field representation data using the image data captured by theimager array 102. Imager arrays including multiple focal planes arediscussed in U.S. patent application Ser. No. 13/106,797, entitled“Architectures for System on Chip Array Cameras” to McMahon et al., theentirety of which is hereby incorporated by reference.

In the illustrated embodiment, the focal planes are configured in a 5×5array. In other embodiments, any of a variety of array configurationscan be utilized including linear arrays, non-rectangular arrays, andsubsets of an array as appropriate to the requirements of specificembodiments of the invention. Each focal plane 104 of the imager arrayis capable of capturing image data from an image of the scene formedthrough a distinct aperture. Typically, each focal plane includes aplurality of rows of pixels that also forms a plurality of columns ofpixels, and each focal plane is contained within a region of the imagerthat does not contain pixels from another focal plane. The pixels orsensor elements utilized in the focal planes can be individual lightsensing elements such as, but not limited to, traditional CIS (CMOSImage Sensor) pixels, CCD (charge-coupled device) pixels, high dynamicrange sensor elements, multispectral sensor elements, and/or any otherstructure configured to generate an electrical signal indicative oflight incident on the structure. In many embodiments, the sensorelements of each focal plane have similar physical properties andreceive light via the same optical channel and color filter (wherepresent). In other embodiments, the sensor elements have differentcharacteristics and, in many instances, the characteristics of thesensor elements are related to the color filter applied to each sensorelement. In a variety of embodiments, a Bayer filter pattern of lightfilters can be applied to one or more of the focal planes 104. In anumber of embodiments, the sensor elements are optimized to respond tolight at a particular wavelength without utilizing a color filter. Itshould be noted that any optical channel, including those in non-visibleportions of the electromagnetic spectrum (such as infrared) can besensed by the focal planes as appropriate to the requirements ofparticular embodiments of the invention.

In several embodiments, information captured by one or more focal planes104 is read out of the imager array 102 as packets of image data. Inmany embodiments, a packet of image data contains one or more pixelsfrom a row of pixels captured from each of one or more of the focalplanes 104. Packets of image data may contain other groupings ofcaptured pixels, such as one or more pixels captured from a column ofpixels in each of one or more focal planes 104 and/or a random samplingof pixels. Systems and methods for reading out image data from arraycameras that can be utilized in array cameras configured in accordancewith embodiments of the invention are described in U.S. Pat. No.8,305,456, entitled “Systems and Methods for Transmitting and ReceivingArray Camera Image Data” to McMahon, the entirety of which is herebyincorporated by reference. In several embodiments, the packets of imagedata are used to create a two-dimensional array of images representingthe light field as captured from the one or more focal planes 104. Inmany embodiments, one or more of the images in the array of images areassociated with a particular color; this color can be the same colorassociated with the focal plane 104 corresponding to the viewpoint ofthe image or a different color. The processor 108 can be configured toimmediately process the captured light field from the one or more focalplanes and/or the processor 108 can store the captured light field andlater process the captured light field. In a number of embodiments, theprocessor 108 is configured to offload the captured light fields to anexternal device for processing.

The processing of captured light fields includes determiningcorrespondences between pixels in the captured light field. In severalembodiments, the pixels in the packets of image data are geometricallycorrelated based on a variety of factors, including, but not limited to,the characteristics of one or more of the focal planes 104. Thecalibration of imager arrays to determine the characteristics of focalplanes are disclosed in U.S. patent application Ser. No. 12/967,807incorporated by reference above. In several embodiments, processor 108is configured (such as by an image processing application) to performparallax detection on the captured light field to determinecorresponding pixel locations along epipolar lines between a referenceimage and alternate view images within the captured light field. Theprocess of performing parallax detection also involves generating adepth map with respect to the reference image (e.g. a referenceviewpoint that may include synthesized ‘virtual’ viewpoints where aphysical camera in the array does not exist). In a variety ofembodiments, the captured packets of image data are associated withimage packet timestamps and geometric calibration and/or photometriccalibration between pixels in the packets of image data utilize theassociated image packet timestamps. Corresponding pixel locations anddifferences between pixels in the reference image and the alternate viewimage(s) can be utilized by processor 108 to determine a prediction forat least some of the pixels of the alternate view image(s). In manyembodiments, the corresponding pixel locations are determined withsub-pixel precision.

The prediction image can be formed by propagating pixels from thereference image(s) to the corresponding pixel locations in the alternateview grid. In many embodiments, the corresponding pixel locations in thealternate view grid are fractional positions (e.g. sub-pixel positions).Once the pixels from the reference image(s) are propagated to thecorresponding positions in the alternate view grid, a predicted image(from the same perspective as the alternate view image) is formed bycalculating prediction values for the integer grid points in thealternate view grid based on propagated pixel values from the referenceimage. The predicted image values in the integer grid of the alternateview image can be determined by interpolating from multiple pixelspropagated from the reference image in the neighborhood of the integerpixel grid position in the predicted image. In many embodiments, thepredicted image values on the integer grid points of the alternate viewimage are interpolated through an iterative interpolation schemes (e.g.a combination of linear or non-linear interpolations) that progressivelyfill in ‘holes’ or missing data at integer positions in the predictedalternate view image grid. In a variety of embodiments, integer gridlocations in the predicted image can be filled using set selectioncriteria. In several embodiments, pixels propagated from the referenceimage within a particular radius of the integer pixel position can forma set, and the pixel in the set closest to the mean of the distributionof pixels in the region can be selected as the predictor. In a number ofembodiments, within the same set, the pixel that lands nearest to theinteger grid point may be used as the predictor (i.e. nearest neighborinterpolation). In another embodiment, an average of the N nearestneighbors may be used as the predicted image value at the integer gridpoint. However, it should be noted that the predicted value can be anyfunction (linear or non-linear) that interpolates or inpaints values inthe predicted image based on reference pixel values in some relationshipof the integer grid position in the predicted image.

The prediction error data itself can be determined by performing aphotometric comparison of the pixel values from the predicted image(e.g. the predicted alternate view image based on the reference imageand the depth map) and the corresponding alternate view image. Theprediction error data represents the difference between the predictedalternate view image based on the reference image and the depth map, andthe actual alternate view image that must be later reproduced in thedecoding process.

Due to variations in the optics and the pixels used to capture the imagedata, sampling diversity, and/or aliasing, the processor 108 isconfigured to anticipate photometric differences between correspondingpixels. These photometric differences may be further increased in thecompared pixels because the nearest neighbor does not directlycorrespond to the pixel in the reference image. In many embodiments, thecompression is lossless and the full captured light field can bereconstructed using the reference image, the depth map, and theprediction error data. In other embodiments, a lossy compression is usedand an approximation of the full captured light field can bereconstructed. In this way, the pixel values of the alternate viewimages are available for use in super-resolution processing, enablingthe super-resolution processes to exploit the sampling diversity and/oraliasing that may be reflected in the alternate view images. In a numberof embodiments, the prediction images are sparse images. In severalembodiments, sparse images contain predictions for some subset of points(e.g. pixels) in the space of the alternate view images. Processor 108is further configured to generate compressed light field representationdata using prediction error data, the reference image, and the depthmap. Other data, such as one or more image packet timestamps, can beincluded as metadata associated with the compressed light fieldrepresentation data as appropriate to the requirements of specific arraycameras in accordance with embodiments of the invention. In severalembodiments, the prediction error data and the reference image arecompressed via lossless and/or lossy image compression techniques. In avariety of embodiments, an image processing application configuresprocessor 108 to perform a variety of operations using the compressedlight field representation data, including, but not limited to,synthesizing high resolution images using a super-resolution process.Other operations can be performed using the compressed light fieldrepresentation data in accordance with a variety of embodiments of theinvention.

Although a specific array camera configured to capture light fields andgenerate compressed light field representation data is illustrated inFIG. 1, alternative architectures, including those containing sensorsmeasuring the movement of the array camera as light fields are captured,can also be utilized as appropriate to the requirements of specificapplications in accordance with embodiments of the invention. Systemsand methods for capturing and processing light fields in accordance withembodiments of the invention are discussed below.

Processing and Interacting with Captured Light Fields

A captured light field, as an array of images, can consume a significantamount of storage space. Generating compressed light fieldrepresentation data using the captured light field while reducing thestorage space utilized can be a processor-intensive task. A variety ofarray cameras in accordance with embodiments of the invention lack theprocessing power to simultaneously capture and process light fieldswhile maintaining adequate performance for one or both of theoperations. Array cameras in accordance with several embodiments of theinvention are configured to separately obtain a captured light field andgenerate compressed light field representation data using the capturedlight field, allowing the array camera to quickly capture light fieldsand efficiently process those light fields as the processing powerbecomes available and/or the compressed light field representation datais needed. A process for processing and interacting with captured lightfields in accordance with an embodiment of the invention is illustratedin FIG. 2. The process 200 includes reading (210) image data from acaptured light field out of an array camera module. Compressed lightfield representation data is generated (212) from the captured lightfield. In a variety of embodiments, a high resolution image issynthesized (214) using the compressed light field representation data.In several embodiments, users can then interact (216) with thesynthesized high resolution image in a variety of ways appropriate tothe requirements of a specific application in accordance withembodiments of the invention.

In many embodiments, a captured light field is obtained (210) using animager array and a processor in the array camera generating (212) thecompressed light field representation data. In a number of embodiments,a captured light field is obtained (210) from a separate device. Inseveral embodiments, the captured light field is obtained (210) andcompressed light field representation data is generated (212) as part ofa single capture operation. In a variety of embodiments, obtaining (210)the captured light field and generating (212) the compressed light fieldrepresentation data occurs at disparate times.

Generating (212) the compressed light field representation data includesnormalizing the obtained (210) captured light field and generating adepth map for the obtained (210) captured light field using geometriccalibration data and photometric calibration data. In many embodiments,parallax detection processes, such as those disclosed in U.S. patentapplication Ser. No. 13/972,881, are utilized to generate a depth mapand prediction error data describing correlation between pixels in thecaptured light field from the perspective of one or more referenceimages. Processes other than those disclosed in U.S. patent applicationSer. No. 13/972,881 can be utilized in accordance with many embodimentsof the invention. The generated (212) compressed light fieldrepresentation data includes the prediction error data, the referenceimages, and the depth map. Additional metadata, such as timestamps,location information, and sensor information, can be included in thegenerated (212) compressed light field representation data asappropriate to the requirements of specific applications in accordancewith embodiments of the invention. In many embodiments, the generated(212) compressed light field representation data is compressed usinglossy and/or non-lossy compression techniques. The generated (212)compressed light field representation data can be stored in a variety offormats, such as the JPEG-DX standard. In several embodiments, thealternate view images in the obtained (210) captured light field are notstored in the generated (212) compressed light field representationdata.

In a number of embodiments, synthesizing (214) a high resolution imageutilizes the reference image(s), the prediction error data, and thedepth map in the generated (212) compressed light field representationdata. In a variety of embodiments, the reference images, the predictionerror data, and the depth map are utilized to reconstruct the array ofimages (or an approximation of the images) to synthesize (214) a highresolution image using a super-resolution process. A high resolutionimage can be synthesized (214) using the array of images representingthe captured light field reconstructed based on the compressed lightfield representation data (212). However, in a number of embodimentssynthesizing (214) a high resolution image using the generated (212)compressed light field representation data includes reconstructing (e.g.decoding) the array of images using the compressed light fieldrepresentation data once the captured light field is to be viewed, suchas in an image viewing application running on an array camera or otherdevice. Techniques for decoding compressed light field representationdata that can be utilized in accordance with embodiments of theinvention are described in more detail below. In several embodiments,high resolution images are synthesized (214) at a variety of resolutionsto support different devices and/or varying performance requirements. Ina number of embodiments, the synthesis (214) of a number of highresolution images is part of an image fusion process such as theprocesses described in U.S. patent application Ser. No. 12/967,807, thedisclosure of which is incorporated by reference above.

Many operations can be performed while interacting (216) withsynthesized high resolution images, such as, but not limited to,modifying the depth of field of the synthesized high resolution image,changing the focal plane of the synthesized high resolution image,recoloring the synthesized high resolution image, and detecting objectswithin the synthesized high resolution image. Systems and methods forinteracting (216) with compressed light field representation data andsynthesized high resolution images that can be utilized in accordancewith embodiments of the invention are disclosed in U.S. patentapplication Ser. No. 13/773,284 to McMahon et al., the entirety of whichis hereby incorporated by reference.

Although a specific process processing and interacting with capturedlight fields in accordance with an embodiment of the invention isdescribed above with respect to FIG. 2, a variety of image deconvolutionprocesses appropriate to the requirements of specific applications canbe utilized in accordance with embodiments of the invention. Processesfor generating compressed light field representation data using capturedlight fields in accordance with embodiments of the invention arediscussed below.

Generating Compressed Light Field Representation Data

A process for generating compressed light field representation data inaccordance with an embodiment of the invention is illustrated in FIG. 3.The process 300 includes obtaining (310) an array of image data. Areference image viewpoint (e.g. a desired viewpoint for the referenceimage) is determined (312). Parallax detection is performed (314) toform a depth map from this reference viewpoint. Predicted images aredetermined (316) corresponding to the alternate view image bypropagating pixels from the reference image to the alternate view grid.Prediction error data is computed (318) as the difference between thepredicted image and the corresponding alternate view image. Where areasof low confidence are detected (320), supplemental prediction images andsupplemental prediction error data is computed (322). In a variety ofembodiments, the reference image(s), prediction error data, and/or thedepth map are compressed (324). Compressed light field representationdata can then be created (326) using the reference image(s), predictionerror data, and/or depth map(s).

In a variety of embodiments, the array of images is obtained (310) froma captured light field. In several embodiments, the obtained (310) arrayof images is packets of image data captured using an imager array. Inmany embodiments, the determined (312) reference image corresponds tothe reference viewpoint of the array of images. Furthermore, thedetermined (312) reference image can be an arbitrary image (or syntheticimage) in the obtained (310) array of images. In a number ofembodiments, each image in the obtained (310) array of images isassociated with a particular color channel, such as, but not limited to,green, red, and blue. Other colors and/or portions of theelectromagnetic spectrum can be associated with each image in the arrayof images in accordance with a variety of embodiments of the invention.In several embodiments, the determined (312) reference image is a greenimage in the array of images. Parallax detection is performed (314) withrespect to the viewpoint of the determined (312) reference image tolocate pixels corresponding to pixels in a reference image by searchingalong epipolar lines in the alternate view images in the array ofimages. In a number of embodiments, the parallax detection usescorrespondences between cameras that are not co-located with theviewpoint of the reference image(s). In many embodiments, the searcharea need not be directly along an epipolar line, but rather a regionsurrounding the epipolar line; this area can be utilized to account forinaccuracies in determining imager calibration parameters and/or theepipolar lines. In several embodiments, parallax detection can beperformed (314) with a fixed and/or dynamically determined level ofprecision; this level of precision can be based on performancerequirements and/or desired compression efficiency, the array of images,and/or on the desired level of precision in the result of the performed(314) parallax detection. Additional techniques for performing parallaxprocesses with varying levels of precision are disclosed in U.S.Provisional Patent Application Ser. No. 61/780,974, filed Mar. 13, 2013,the entirety of which is hereby incorporated by reference.

Disparity Information From a Single Reference Image

Turning now to FIG. 4A, a conceptual illustration of a two-dimensionalarray of images and associated epipolar lines as utilized in determiningpixel correspondences in accordance with an embodiment of the inventionis shown. The 4×4 array of images 400 includes a reference image 410, aplurality of alternate view images 412, a plurality of epipolar lines414, and baselines 416 representing the distance between optical centersof particular pairs of cameras in the array. Performing (314) parallaxdetection along epipolar lines 414 calculates disparity information forthe pixels in one or more of the alternate view images 412 relative tothe corresponding pixels in the reference image 410. In a number ofembodiments, the epipolar lines are geometric distortion-compensatedepipolar lines between the pixels corresponding to the photosensitivesensors in the focal planes in the imager array that captured the arrayof images. In several embodiments, the calculation of disparityinformation first involves the utilization of geometric calibration dataso that disparity searches can be directly performed along epipolarlines within the alternate view images. Geometric calibration data caninclude a variety of information, such as inter- and intra-camera lensdistortion data obtained from an array camera calibration process. Othergeometric calibration data can be utilized in accordance with a numberof embodiments of the invention. In a variety of embodiments,photometric pre-compensation processes are performed on one or more ofthe images prior to determining the disparity information. A variety ofphotometric pre-compensation processes, such as vignette correction, canbe utilized in accordance with many embodiments of the invention.Although specific techniques for determining disparity are discussedabove, any of a variety of techniques appropriate to the requirements ofa specific application can be utilized in accordance with embodiments ofthe invention, such as those disclosed in U.S. patent application Ser.No. 13/972,881, incorporated by reference above.

In a variety of embodiments, performing (314) parallax detectionincludes generating a depth map describing depth information in thearray of images. In many embodiments, the depth map is metadatadescribing the distance from the reference camera (i.e. viewpoint) tothe portion of the scene captured in the pixels (or a subset of thepixels) of an image determined using the corresponding pixels in some orall of the alternate view images. In several embodiments, candidatecorresponding pixels are those pixels in alternate view images thatappear along epipolar lines from pixels in the reference image. In anumber of embodiments, a depth map is generated using only images in thearray of images that are associated with the same color (for example,green) as the reference image. In several embodiments, the depth map isgenerated using images of the same color but a different color than thereference camera. For example, with a green reference image, a depth mapcan be generated using only the images associated with the color red (orblue) in the array of images. In many embodiments, depth information isdetermined with respect to multiple colors and combined to generate adepth map; e.g. depth information is determined separately for thesubsets of green, red, and blue images in the array of images and afinal depth map is generated using a combination of the green depthinformation, the red depth information, and the blue depth information.In a variety of embodiments, the depth map is generated usinginformation from any set of cameras in the array. In a variety ofembodiments, the depth map is generated without respect to colorsassociated with the images and/or with a combination of colorsassociated with the images. In several embodiments, performing (314)parallax detection can be performed utilizing techniques similar tothose described in U.S. patent application Ser. No. 13/972,881,incorporated by reference above. Additionally, non-color images (such asinfrared images) can be utilized to generate the depth map asappropriate to the requirements of specific embodiments of theinvention.

Although a specific example of a 4×4 array of images that can beutilized to determine disparity information and a depth map from areference image in the 4×4 array of images is described above withrespect to FIG. 4A, any size array, and any set of cameras in that arraycan be used to determine disparity information and a depth map inaccordance with embodiments of the invention.

Returning now to FIG. 3, depth information determined during theperformed (314) parallax detection is used to determine (316) predictionimages including pixel location predictions for one or more pixels inthe reference image in at least one alternate view image. In severalembodiments, the depth map generated during parallax detection can beused to identify pixel locations within alternate view imagescorresponding to a pixel location within the reference image withfractional pixel precision. In a variety of embodiments, determining(316) a prediction image in the alternate view includes mapping thefractional pixel location to a specific pixel location (or pixellocations) within the pixel grid for the alternate view image. Inseveral embodiments, specific integer grid pixel location(s) in thepredicted image for the alternate view are determined as a function ofthe neighbors within the support region. These functions include, butare not limited to the nearest neighbor (or a function of the nearest Nneighbors) to the integer grid point within the support region. In otherembodiments, any other localized fixed or adaptive mapping techniqueincluding (but not limited) to techniques that map based on depth inboundary regions can be utilized to identify a pixel within an alternateview image for the purpose of generating a prediction for the selectedpixel in the alternate view image. Additionally, filtering can beincorporated into the computation of prediction images in order toreduce the amount of prediction error. In several embodiments, theprediction error data is computed (318) from the difference of theprediction image(s) and their respective alternative view image(s). Theprediction error data can be utilized in the compression of one or moreimages in the captured light field. In several embodiments, the computed(318) prediction error data is the signed difference between the valuesof a pixel in the predicted image and the pixel at the same gridposition in the alternate view image. In this way, the prediction errordata typically does not reflect the error in the location prediction fora pixel in the reference image relative to a pixel location in thealternate view image. Instead, the prediction error data primarilydescribes the difference in photometric information between a pixel inthe determined prediction image that was generated by propagating pixelsfrom the reference image and a pixel in an alternate view image.Although specific techniques are identified for determining predictedimages utilizing correspondence information determined using a depthmap, any of a variety of approaches can be utilized for determiningprediction images utilizing correspondence information determined usinga depth map as appropriate to the requirements of a specific applicationin accordance with embodiments of the invention.

In a variety of embodiments, virtual red, virtual green, and/or virtualblue reference images can be utilized as a reference image. For example,a depth map can be determined for a particular reference viewpoint thatmay not correspond to the location of a physical camera in the array.This depth map can be utilized to form a virtual red image, virtualgreen, and/or virtual blue image from the captured light field. Thesevirtual red, virtual green, and and/or virtual blue images can then beutilized as the reference image(s) from which to create the predictionimages for the alternate view(s) utilized in the processes describedabove. By way of a second example, one or more virtual red, virtualgreen, and/or virtual blue images and/or physical red, green, and/orblue images within the array of images can be used as reference images.When forming the prediction error, the depth map can be utilized to forma prediction image from virtual and/or actual reference image(s) andcalculate the prediction error with respect to the correspondingalternate view images.

Turning now to FIG. 5, a prediction error histogram 500 conceptuallyillustrating computed (318) prediction error data between pixels in apredicted image and an alternate view image in accordance with anembodiment of the invention is shown. The prediction error representedby prediction error histogram 500 can be utilized in the compression ofthe corresponding image data using the computed (318) prediction errordata. Although a specific example of a prediction error histogram inaccordance with an embodiment of the invention is conceptuallyillustrated in FIG. 5, any variety of prediction errors, including thosethat have statistical properties differing from those illustrated inFIG. 5, and any other applicable error measurement can be utilized inaccordance with the requirements of embodiments of the invention.

Returning now to FIG. 3, the correlation between spatially proximatepixels in an image can be exploited to compare all pixels within a patchof an alternate view image to a pixel and/or a patch from a referenceimage in several embodiments of the invention. Effectively, the pixelsfrom a region in the predicted image are copied from a patch in thereference image. In this way, a trade-off can be achieved betweendetermining fewer corresponding pixel locations based on the depthand/or generating a lower resolution depth map for a reference image andencoding a potentially larger range of prediction errors with respect toone or more alternate view images. In a number of embodiments, theprocess of encoding the prediction error data can be adaptive in thesense that pixels within a region of an alternate view image can beencoded with respect to a specific pixel in a reference image and a newpixel from the reference image can be selected that has a correspondingpixel location closer to a pixel in the alternate view image in theevent that the prediction error exceeds a predetermined threshold.

Prediction error data for the alternate view is computed (318) using thedetermined (312) reference image, the determined (316) predictionimages, and depth map. For the reference image p_(ref) and alternateview images p_(k,l) where k,l represents the location of the alternateview image in the array of images p, the depth information provides asubset of images p_(k,l) where

p_(ref)(x,y):=p_(k,l)(i,j)

where (x,y) is the location of a pixel in p_(ref) and (i,j) is thelocation of a pixel at a fractional location in p_(k,l) corresponding toa pixel in p_(ref)(x,y) based on the depth information. Using thesemappings of subsets of pixels to the alternate viewpoint p_(k,l) aprediction image is calculated from the reference pixels mapping top_(k,l). (316). The prediction error data E_(k,l) can be computed (318)between p_(ref) and the prediction image for viewpoint p_(k,l) describedabove. In a variety of embodiments, the determined (316) predictionimages include sparse images. The missing values for the sparselypopulated images can be interpolated using populated values within aneighborhood of the missing pixel value. For example, a kernelregression may be applied to the populated values to fill in the missingprediction values. In these cases, the prediction error data E_(k,l) isa representation of the error induced by the interpolation of themissing values.

In many embodiments, the determined (316) initial prediction image is asparsely populated grid of fractionally-positioned points from thereference frame p_(ref) that includes “holes” that are locations orregions on the alternate view integer grid that are not occupied by anyof the pixels mapped from the reference frame p_(ref). The presence of“holes” can be particularly prevalent in occluded areas but may occur innon-occlusion regions due to non-idealities in the depth map or due tothe fact that many pixels in the reference camera correspond tofractional positions in the alternate view image. In severalembodiments, holes in the prediction error data can be filled using theabsolute value of the pixel location in the alternate view imagep_(k,l). This is similar to filling the predicted image with a value ofzero (i.e. any null value or default value) to ensure that the codederror is equal to the value of the pixel in the alternate view image atthat position. In a variety of embodiments, “holes” in a predicted imagecan be filled using interpolation with predicted values from neighboringpixels to create additional predictions for the holes based on thepixels from the predicted image. As can be readily appreciated, anyinterpolator can be utilized to create interpolated predicted imagepixels from pixels propagated from the reference image. The details ofthe interpolation scheme used is a parameter of the encoding anddecoding process and should be applied in both the encoder and decoderto ensure lossless output. In a variety of embodiments, residuals canprovide more efficient compression than encoding holes with absolutevalues. In several embodiments, the pixels from the reference imagep_(ref) do not map to exact grid locations within the prediction imageand a mapping that assigns a single pixel value to multiple adjacentpixel locations on the integer grid of the prediction image is used. Inthis way, there is a possibility that multiple pixels from the referenceimage p_(ref) may map to the same integer grid location in theprediction image. In this case, pixel stacking rules can be utilized togenerate multiple prediction images in which different stacked pixelsare used in each image. In many embodiments, if N pixels exist in apixel stack, then the resulting predicted value could be the mean of theN pixel values in the stack. However, any number of prediction imagescan be computed and/or any other techniques for determining predictionimages where multiple pixels map to the same location (e.g. a pixelstack exists) as appropriate to the requirements of specific embodimentsof the invention. In a variety of embodiments, holes can remain withinthe predicted images after the initial interpolation; additionalinterpolation processes can be performed until every location on theinteger grid of the predicted image (or a predetermined number oflocations) is assigned a pixel value. Any interpolation technique, suchas kernel regression or inpainting, can be used to fill the remainingholes as described. In other embodiments, a variety of techniques can beutilized to achieve compression of raw data that involve creatingmultiple prediction images and/or pieces of prediction error data.Furthermore, any variety of interpolation techniques known to thoseskilled in the art can be utilized to fill holes in a prediction imageas appropriate to the requirements of a specific application inaccordance with embodiments of the invention.

Prediction Error Data From Multiple Reference Images

In a variety of embodiments, performing (314) parallax detection doesnot return accurate disparity information for pixels in alternate viewimages that are occluded, appear in featureless (e.g. textureless) areasrelative to the reference image, or where the depth map exhibits othernon-idealities such as photometric mismatch. Using the determined (316)prediction error data and/or the depth map and/or a confidence mapdescribing areas of low confidence in the depth map, areas of lowconfidence can be identified (320). Areas of low confidence indicateareas in the reference viewpoint where the depth measurement may beinaccurate or the pixels may otherwise not photometrically correspond(for example due to defects in the reference image), leading topotential inefficiencies in compression and/or performance. Lowconfidence can be determined in a variety of ways, such as identifyingareas having a parallax cost function exceeding a threshold value. Forexample, if the parallax cost function indicates a low cost (e.g. lowmismatch), this indicates that the focal planes agree on a particulardepth and the pixels appear to correspond. Similarly, a high costindicates that not all focal planes agree with respect to the depth, andtherefore the computed depth is unlikely correctly represent thelocations of objects within the captured light field. However, any of avariety of techniques for identifying areas of low confidence can beutilized as appropriate to the requirements of specific embodiments ofthe invention, such as those disclosed in U.S. patent application Ser.No. 13/972,881, incorporated by reference above. In many embodiments,these potential inefficiencies are disregarded and no additional actionis taken with respect to the identified (320) areas of low confidence.In several embodiments, potential inefficiencies are disregarded bysimply encoding the pixels from the alternate view images rather thancomputing the prediction error. In a number of embodiments, if an areaof low confidence (e.g. correspondence mismatch) is identified (320),one or more additional reference images are selected and supplementalprediction images (or portions of supplemental prediction image(s)) arecomputed (322) from the additional reference images. In severalembodiments, additional reference images or portions of additionalreference images are utilized when detected objects in the array ofimages for areas where the determined (316) prediction error data wouldbe large using a single reference image; for example, in an occlusionzone). A large prediction error rate can be predicted when objects in acaptured light field are close to the imager array, although anysituation where a large prediction error rate is (316) determined can bethe basis for selecting additional reference images in accordance withembodiments of the invention.

Turning now to FIG. 4B, a conceptual illustration of a two-dimensionalarray of images from two reference images as utilized in determiningsupplemental pixel correspondences in accordance with an embodiment ofthe invention is shown. The 4×4 array of images 450 includes a referenceimage 460, a secondary reference image 466, a plurality of greenalternate view images 462, a plurality of red alternate view images 461,a plurality of blue alternate view images 463, a plurality of predictiondependencies 464 extending from the reference image 460, and a pluralityof secondary prediction dependencies 470 extending from the secondaryreference image 466. A baseline 468 extends from the primary referenceimage 460 to the secondary reference image 466. Primary predictionimages are computed (316) from those alternate view images associatedwith the reference image 460 by the prediction dependencies 464utilizing processes similar to those described above. Likewise,supplemental prediction images are computed (322) along secondaryepipolar lines from the secondary reference image 466 using thosealternate view images associated with the secondary reference image 466via the secondary prediction dependencies 470. In a number ofembodiments, the pixels in the computed (322) supplemental predictionimages can be mapped to the pixels in the reference image 460 usingbaseline 468.

In several embodiments, the alternate view images that are utilized inperforming (314) parallax detection are clustered around the respectivereference image that is utilized in performing (314) parallax detection.In a variety of embodiments, the images are clustered in a way to reducethe disparity and/or improve pixel correspondence between the clusteredimages, thereby reducing the number of pixels from the alternate viewimages that are occluded from the viewpoints of both the reference imageand the secondary reference image. In many embodiments, the alternateview images are clustered to the primary reference image 460, thesecondary reference image 466, and/or together based on the colorassociated with the images. For example, if reference image 460 andsecondary reference image 466 are green, only the green alternate viewimages 462 are associated with the reference image 460 and/or thesecondary reference image 466. Likewise, the red alternate view images461 (or the blue alternate view images 463) are associated with eachother for the purposes of computing (322) supplemental prediction imagesand/or performing (314) parallax detection.

In many embodiments, particularly those embodiments employing losslesscompression techniques, the secondary reference image 466 is predictedusing the reference image 460 and the baseline 468 that describes thedistance between the optical centers of the reference image and thesecondary reference image. In several embodiments, the secondaryreference image 466 is selected to reduce the size of the occlusionzones (and thus predictability of pixels); that is, parallax detectionis performed and error data is determined as described above using thesecondary reference image 466. In many embodiments, the secondaryreference image 466 is associated with the same color channel as thereference image 460. A specific example of a two-dimensional array ofimages with two reference images that can be utilized to compute (322)supplemental prediction images is conceptually illustrated in FIG. 4B;however, any array of images and more than two reference images can beutilized in accordance with embodiments of the invention. For example,supplemental references images can be computed per color channel. Takingthe array illustrated in FIG. 4B, six reference images (one primaryreference image for each of the red, blue, and green channels along withone secondary reference image for each of the red, blue, and greenchannels) can be utilized in the generation of prediction images and theassociated predicted error data. Additionally, in a variety ofembodiments, a subset of the pixels within the reference image and/orsupplemental reference image (e.g. a region or a sub-portion) can beutilized in the calculation of prediction error data utilizing processessimilar to those described above.

In a variety of embodiments, particularly those utilizing lossycompression techniques, a variety of coding techniques can be utilizedto account for the effects of lossy compression in the reference imageswhen predicting an alternate view image. In several embodiments, beforethe prediction image and prediction error data for the alternate viewimage are formed, the reference image is compressed using a lossycompression algorithm. The compressed reference image is thendecompressed to form a lossy reference image. The lossy reference imagerepresents the reference image that the decoder will have in the initialstages of decoding. The lossy reference image is used along with thedepth map to form a lossy predicted image for the alternate view image.The prediction error data for the alternate view image is thencalculated by comparing the lossy reference image with the alternateview image (e.g. by taking the signed difference of the two images). Inthis way, when using lossy compression, the prediction error data willtake into account the lossy nature of the encoding of the referenceimage when forming the prediction error data.

In a variety of embodiments, the alternative reference images are basedon the reference image. In several embodiments, the reference image usedto predict the viewpoint of the alternate reference image undergoes alossy compression. A lossy compression is applied to the predictionerror data for the alternate reference image. The reference image isthen decompressed to generate a lossy reference image. A lossy predictedimage is generated from the decompressed reference image for thealternate reference image. The compressed prediction error data isdecompressed to form the lossy prediction error data. The lossyprediction error data is added to the lossy predicted image to form thelossy predicted alternate reference image. The prediction image andprediction error data for any subsequent alternate view image thatdepends on the alternate reference image will be formed using the lossypredicted alternate reference image. In a number of embodiments, thisforms the alternate view image that can be reconstructed utilizing lossyreconstruction techniques as described below. This process can berepeated for each alternate view image as necessary. In this way,prediction error data can be accurately computed (relative to theuncompressed light field data) using the lossy compressed image data.

Returning now to FIG. 3, in a number of embodiments, the determined(312) reference image along with the computed (318) prediction errordata and/or any computed (322) supplemental prediction images (ifrelevant) are compressed (324). In many embodiments, supplementalprediction error data based on the computed (322) supplementalprediction images is compressed (324). This compression can be losslessor lossy depending on the requirements of a particular embodiment of theinvention. When the images are compressed (324), they can bereconstructed (either exactly or approximately depending on thecompression (324) technique(s) utilized) by forming a predictedalternate view image from the reference image data and the depth map,and adding the decoded prediction error data to the predicted alternateview image(s) using an image decoder. Additionally, particularly inthose embodiments utilizing lossy compression techniques, metadatadescribing the information lost in the compression of the referenceimage(s) and/or prediction error data can be stored in the compressedlight field representation data. Alternatively, this information can bestored in the prediction error data. This information can be utilized inthe decoding of the compressed light field representation data toaccurately reconstruct the originally captured images by correcting forthe information lost in the lossy compression process. Techniques fordecoding losslessly compressed light field representation data inaccordance with embodiments of the invention are described in moredetail below. In a variety of embodiments, the compression (324) of theimages depends on the computed (318) prediction error data. Compressedlight field representation data is generated (326) using the determined(312) reference image(s) and computed (318) prediction error data alongwith the depth map generated during the performed (314) parallaxdetection. In those embodiments with multiple reference images, thesecondary reference images or portions of the secondary reference imagescan be included in the compressed light field representation data and/orthe secondary reference images can be reconstructed using the computed(318) prediction error data with respect to the reference image.Additional metadata can be included in the generated (326) compressedlight field representation data as appropriate to the requirements of aspecific application in accordance with embodiments of the invention.

In a variety of embodiments, supplemental depth information isincorporated into the depth map and/or as metadata associated with thecompressed light field representation data. In a number of embodiments,supplemental depth information is encoded with the additional referenceviewpoint(s). In many embodiments, the depth information used for eachreference viewpoint is calculated using any sets of cameras during theencoding process that may be similar or may be different depending onthe viewpoint. In many embodiments, depth for an alternate referenceviewpoint is calculated for only sub-regions of the alternate referenceviewpoint so that an entire depth map does not need to be encoded foreach viewpoint. In many embodiments, a depth map for the alternatereference viewpoint is formed by propagating pixels from the depth mapfrom a primary reference viewpoint. If there are holes in the depth mappropagated to the alternate reference viewpoint they can be filled byinterpolating from nearby propagated pixels in the depth map, or throughdirect detection from the alternate viewpoint. In many embodiments, thedepth map from the alternate reference viewpoint can be formed by acombination of propagating depth values from another referenceviewpoint, interpolating for missing depth values in the alternatereference viewpoint, or directly detecting regions of particular depthvalues in the alternative reference viewpoint. In this way, the depthmap created by performing (314) parallax detection above can beaugmented with depth information generated from alternate referenceimages.

A specific process for generating compressed light field representationdata in accordance with an embodiment of the invention is describedabove with respect to FIG. 3; however, a variety of processesappropriate to the requirements of specific applications can be utilizedin accordance with embodiments of the invention. In particular, theabove processes can be performed using all or a subset of the images inthe obtained (310) array of images.

Decoding Compressed Light Field Representation Data

As described above, compressed light field representation data can beutilized to efficiently store captured light fields. However, in orderto utilize the compressed light field representation data to performadditional processed on the captured light field (such as parallaxprocesses), the compressed light field representation data need bedecoded to retrieve the original (or an approximation of the) capturedlight field. A process for decoding compressed light fieldrepresentation data is conceptually illustrated in FIG. 6. The process600 includes obtaining (610) compressed light field representation data.In many embodiments, the compressed light field representation data isdecompressed (611). A reference image is determined (612) and alternateview images are formed (614). In many embodiments, alternative referenceimages are present (616). If alternative reference image are present,the process 600 repeats using (618) the alternative reference image(s).Once the alternate view images are reconstructed, the captured lightfield is reconstructed (620).

In a variety of embodiments, decompressing (611) the captured lightfield representation data includes decompressing the reference image,depth map, and/or prediction error data compressed utilizing techniquesdescribed above. In several embodiments, the reference image (612)corresponds to a viewpoint (e.g. a focal plane in an imager array) imagein the compressed light field representation data; however, it should benoted that reference images from virtual viewpoints (e.g. viewpointsthat do not correspond to a focal plane in the imager array) can also beutilized as appropriate to the requirements of specific embodiments ofthe invention. In a number of embodiments, the alternate view images areformed (614) by computing prediction images using the determined (612)reference image and the depth map, then applying the prediction errordata to the computed prediction images. However, any technique forforming (614) the alternate view images, including directly forming thealternate view images using the determined (612) reference image and theprediction error data, can be utilized as appropriate to therequirements of specific embodiments of the invention. Additionally,metadata describing the interpolation techniques utilized in thecreation of the compressed light field representation data can beutilized in computing the prediction images. In this way, the decodingprocess results in prediction images that, once the prediction errordata is applied to the prediction images, correct (or an approximationto correct) alternate view images are formed (614). This allows formultiple interpolation techniques to be utilized in the encoding ofcompressed light field representation data, e.g. adaptive interpolationtechniques can be utilized based on the requirements of specificembodiments of the invention. The captured light field is reconstructed(620) using the alternative view images and the reference image. In manyembodiments, the captured light field also includes the depth map, theprediction error data, and/or any other metadata included in thecompressed light field representation data.

In a variety of embodiments, multiple reference images (e.g. a primaryreference image and one or more secondary reference images) exist withinthe compressed light field representation data. The alternate referenceimages can be directly included in the compressed light fieldrepresentation data and/or formed utilizing techniques similar to thosedescribed above. Using (618) an alternative reference image furtherincludes recursively (and/or iteratively) forming alternate view imagesfrom the viewpoint of each reference image utilizing techniquesdescribed above. In this way, the alternative view images are mappedback to the viewpoint of the (primary) reference image and allowing thecaptured light field to be reconstructed (620).

In those embodiments utilizing lossy compression techniques, informationcritical to determining (612) the reference image and/or an alternativereference image can be lost. However, this loss can be compensated forby storing the lost information as metadata within the compressed lightfield representation data and/or as part of the prediction error data.Then, when determining (612) the reference image and/or the alternatereference image, the metadata and/or prediction error data can beapplied to the compressed image in order to reconstruct the original,uncompressed image. Using the uncompressed reference image, the decodingof the compressed captured light field representation data can proceedutilizing techniques similar to those described above to reconstruct(320) the captured light field. In a variety of embodiments, predictingthe alternative reference image from the reference image, if lossycompression is used, includes reconstructing the alternate referenceimage by coding and decoding (losslessly) the prediction error data thenadding the decoded prediction error to the reference image. In this way,the original alternate reference image can be reconstructed and used topredict specific alternate views as described above.

A specific process for decoding compressed light field representationdata in accordance with an embodiment of the invention is describedabove with respect to FIG. 6; however, a variety of processesappropriate to the requirements of specific applications can be utilizedin accordance with embodiments of the invention.

Although the present invention has been described in certain specificaspects, many additional modifications and variations would be apparentto those skilled in the art. It is therefore to be understood that thepresent invention can be practiced otherwise than specifically describedwithout departing from the scope and spirit of the present invention.Thus, embodiments of the present invention should be considered in allrespects as illustrative and not restrictive. Accordingly, the scope ofthe invention should be determined not by the embodiments illustrated,but by the appended claims and their equivalents.

What is claimed is:
 1. An array camera, comprising: a processor; and amemory connected to the processor and configured to store an imageprocessing application; wherein the image processing applicationconfigures the processor to: obtain image data, wherein: the image datacomprises a set of images comprising a reference image and at least onealternate view image; and each image in the set of images comprises aset of pixels; generate a depth map based on the image data, where thedepth map describes the distance from the viewpoint of the referenceimage with respect to objects imaged by pixels within the referenceimage; determine at least one prediction image based on the referenceimage and the depth map, where the prediction images correspond to atleast one alternate view image; compute prediction error data based onthe at least one prediction image and the at least one alternate viewimage, where a portion of prediction error data describes the differencein photometric information between a pixel in a prediction image and apixel in at least one alternate view image corresponding to theprediction image; and generate compressed light field representationdata based on the reference image, the prediction error data, and thedepth map.